深度估计是3D重建的具有挑战性的任务,以提高环境意识的准确性感测。这项工作带来了一系列改进的新解决方案,与现有方法相比,增加了一系列改进,这增加了对深度图的定量和定性理解。最近,卷积神经网络(CNN)展示了估计单眼图象的深度图的非凡能力。然而,传统的CNN不支持拓扑结构,它们只能在具有确定尺寸和重量的常规图像区域上工作。另一方面,图形卷积网络(GCN)可以处理非欧几里德数据的卷积,并且它可以应用于拓扑结构内的不规则图像区域。因此,在这项工作中为了保护对象几何外观和分布,我们的目的是利用GCN进行自我监督的深度估计模型。我们的模型包括两个并行自动编码器网络:第一个是一个自动编码器,它取决于Reset-50,并从输入图像和多尺度GCN上提取功能以估计深度图。反过来,第二网络将用于基于Reset-18的两个连续帧之间估计自我运动矢量(即3D姿势)。估计的3D姿势和深度图都将用于构建目标图像。使用与光度,投影和平滑度相关的损耗函数的组合用于应对不良深度预测,并保持对象的不连续性。特别是,我们的方法提供了可比性和有前途的结果,在公共基准和Make3D数据集中的高预测精度为89%,与最先进的解决方案相比,培训参数的数量减少了40%。源代码在https://github.com/arminmasoumian/gcndepth.git上公开可用
translated by 谷歌翻译
倒置摆是一种非线性不平衡系统,需要使用电动机控制以实现稳定性和平衡。倒置摆用乐高构建,并使用乐高思维NXT,这是一种可编程机器人,能够完成许多不同的功能。在本文中,提出了倒置摆的初始设计,研究了与乐高思维NXT兼容的不同传感器的性能。此外,还研究了计算机视觉实现维持系统所需的稳定性的能力。倒置摆是一种传统推车,可以使用模糊逻辑控制器来控制,该模糊逻辑控制器为推车产生自调谐PID控制以继续前进。模糊逻辑和PID在Matlab和Simulink中模拟,并且在LabVIEW软件中开发了机器人的程序。
translated by 谷歌翻译
本文的目的是描述一种在实时反馈中检测滑动和接触力的方法。在这种新颖的方法中,戴维斯相机由于其快速处理速度和高分辨率而被用作视觉触觉传感器。在具有不同形状,尺寸,重量和材料的四个物体上进行两百实验,以比较Baxter机器人夹持器的精度和响应以避免滑动。通过使用力敏感电阻(FSR402)验证了先进的方法。使用Davis Camera捕获的事件通过特定算法处理,以向允许其检测滑动的Baxter Robot提供反馈。
translated by 谷歌翻译
基于敏感数据的机器学习模型在现实世界的承诺中,在医学筛查到疾病爆发,农业,工业,国防科学等地区的进步。在许多应用中,学习参与者通信转舍受益于收集自己的私​​有数据集,在真实数据上教导详细的机器学习模型,并共享使用这些模型的好处。由于现有的隐私和安全问题,大多数人都避免敏感数据分享进行培训。如果没有每个用户向中央服务器演示其本地数据,联邦学习允许各方共同地在其共享数据上培训机器学习算法。这种集体隐私学习方法导致培训期间的重要沟通。大多数大型机器学习应用程序需要基于各种设备和地点生成的数据集的分散学习。这样的数据集代表了分散学习的基本障碍,因为它们的各种环境有助于跨设备和位置的数据交付的显着差异。研究人员提出了几种方法来实现联邦学习系统中的数据隐私。但是,仍存在均匀的本地数据仍存在挑战。该研究方法是选择节点(用户)以在联合学习中共享他们的数据,以便为基于独立的数据的平衡来提高准确性,降低培训时间和增加收敛。因此,本研究介绍了基于名为DQRE-SCNet的光谱聚类的组合的深度QREInforceNce学习合奏,以在每个通信中选择设备的子集。基于结果,展示了可以减少联合学习所需的通信轮数量。
translated by 谷歌翻译
通过使用立体声相机或3D摄像机估计深度图像,确定场景中的对象和来自2D图像的相机传感器之间的距离。深度估计的结果是相对距离,可用于计算实际上适用的绝对距离。然而,距离估计非常具有挑战性,使用2D单手套相机。本文介绍了深度学习框架,由两个深度网络组成,用于使用单个图像进行深度估计和对象检测。首先,使用您只有一次(yolov5)网络,检测和本地化场景中的对象。并行地,使用深度自动统计器网络计算估计的深度图像以检测相对距离。基于对象检测的基于对象的Yolo使用监督学习技术训练,又逆转,深度估计网络是自我监督的培训。呈现距离估计框架是在室外场景的真实图像上进行评估。所达到的结果表明,该框架具有前景,其含量为96%,RMSE为0.203的正确绝对距离。
translated by 谷歌翻译
State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$\times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 1.6$\times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 1.3B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark.
translated by 谷歌翻译
Recently, Smart Video Surveillance (SVS) systems have been receiving more attention among scholars and developers as a substitute for the current passive surveillance systems. These systems are used to make the policing and monitoring systems more efficient and improve public safety. However, the nature of these systems in monitoring the public's daily activities brings different ethical challenges. There are different approaches for addressing privacy issues in implementing the SVS. In this paper, we are focusing on the role of design considering ethical and privacy challenges in SVS. Reviewing four policy protection regulations that generate an overview of best practices for privacy protection, we argue that ethical and privacy concerns could be addressed through four lenses: algorithm, system, model, and data. As an case study, we describe our proposed system and illustrate how our system can create a baseline for designing a privacy perseverance system to deliver safety to society. We used several Artificial Intelligence algorithms, such as object detection, single and multi camera re-identification, action recognition, and anomaly detection, to provide a basic functional system. We also use cloud-native services to implement a smartphone application in order to deliver the outputs to the end users.
translated by 谷歌翻译
In recent years, nonlinear model predictive control (NMPC) has been extensively used for solving automotive motion control and planning tasks. In order to formulate the NMPC problem, different coordinate systems can be used with different advantages. We propose and compare formulations for the NMPC related optimization problem, involving a Cartesian and a Frenet coordinate frame (CCF/ FCF) in a single nonlinear program (NLP). We specify costs and collision avoidance constraints in the more advantageous coordinate frame, derive appropriate formulations and compare different obstacle constraints. With this approach, we exploit the simpler formulation of opponent vehicle constraints in the CCF, as well as road aligned costs and constraints related to the FCF. Comparisons to other approaches in a simulation framework highlight the advantages of the proposed approaches.
translated by 谷歌翻译
Automated Program Repair (APR) is defined as the process of fixing a bug/defect in the source code, by an automated tool. APR tools have recently experienced promising results by leveraging state-of-the-art Neural Language Processing (NLP) techniques. APR tools such as TFix and CodeXGLUE combine text-to-text transformers with software-specific techniques are outperforming alternatives, these days. However, in most APR studies the train and test sets are chosen from the same set of projects. In reality, however, APR models are meant to be generalizable to new and different projects. Therefore, there is a potential threat that reported APR models with high effectiveness perform poorly when the characteristics of the new project or its bugs are different than the training set's(Domain Shift). In this study, we first define and measure the domain shift problem in automated program repair. Then, we then propose a domain adaptation framework that can adapt an APR model for a given target project. We conduct an empirical study with three domain adaptation methods FullFineTuning, TuningWithLightWeightAdapterLayers, and CurriculumLearning using two state-of-the-art domain adaptation tools (TFix and CodeXGLUE) and two APR models on 611 bugs from 19 projects. The results show that our proposed framework can improve the effectiveness of TFix by 13.05% and CodeXGLUE by 23.4%. Another contribution of this study is the proposal of a data synthesis method to address the lack of labelled data in APR. We leverage transformers to create a bug generator model. We use the generated synthetic data to domain adapt TFix and CodeXGLUE on the projects with no data (Zero-shot learning), which results in an average improvement of 5.76% and 24.42% for TFix and CodeXGLUE, respectively.
translated by 谷歌翻译
In recent years, we have seen a significant interest in data-driven deep learning approaches for video anomaly detection, where an algorithm must determine if specific frames of a video contain abnormal behaviors. However, video anomaly detection is particularly context-specific, and the availability of representative datasets heavily limits real-world accuracy. Additionally, the metrics currently reported by most state-of-the-art methods often do not reflect how well the model will perform in real-world scenarios. In this article, we present the Charlotte Anomaly Dataset (CHAD). CHAD is a high-resolution, multi-camera anomaly dataset in a commercial parking lot setting. In addition to frame-level anomaly labels, CHAD is the first anomaly dataset to include bounding box, identity, and pose annotations for each actor. This is especially beneficial for skeleton-based anomaly detection, which is useful for its lower computational demand in real-world settings. CHAD is also the first anomaly dataset to contain multiple views of the same scene. With four camera views and over 1.15 million frames, CHAD is the largest fully annotated anomaly detection dataset including person annotations, collected from continuous video streams from stationary cameras for smart video surveillance applications. To demonstrate the efficacy of CHAD for training and evaluation, we benchmark two state-of-the-art skeleton-based anomaly detection algorithms on CHAD and provide comprehensive analysis, including both quantitative results and qualitative examination.
translated by 谷歌翻译